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Abstract 

We model evolution according to an asymmetric game as occurring in multiple 
finite populations, one for each role in the game, and study the effect of subjecting 
individuals to stochastic strategy mutations. We show that, when these mutations 
occur sufficiently infrequently, the dynamics over all population states simplify to an 
ergodic Markov chain over just the pure population states (where each population is 
monomorphic). This makes calculation of the stationary distribution computationally 
feasible. The transition probabilities of this embedded Markov chain involve fixation 
probabilities of mutants in single populations. The asymmetry of the underlying game 
leads to fixation probabilities that are derived from frequency-independent selection, in 
contrast to the analogous single-population symmetric-game case [1]. This frequency 
independence is useful in that it allows us to employ results from the population ge¬ 
netics literature to calculate the stationary distribution of the evolutionary process, 
giving sharper, and sometimes even analytic, results. We demonstrate the utility of 
this approach by applying it to a battle-of-the-sexes game, a Crawford-Sobel signalling 
game, and the beer-quiche game of Cho and Kreps [2]. 
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1 Introduction 


In evolutionary game theory, games are played within populations, and the prevalence of 
different strategies changes over time according to natural-selection-like dynamics [3-8]. 
This provides a natural method by which to model biological evolution [3] and various 
learning processes [9], and offers a ‘rationality-light’ approach to equilibrium selection [6]. 

In the classical approach, populations are infinitely large and dynamics are determin¬ 
istic; the focus is typically on the equilibrium refinement of evolutionary stability [3, 5]. 
More recently, stochastic finite-population dynamics have been introduced into evolution¬ 
ary game theory [1, 7, 10-13]. These often take the form of an ergodic Markov chain—for 
example, when there is a positive mutation rate—the state space of which is all possi¬ 
ble strategy compositions of the population [1]. Ranking the various population states’ 
weights in the stationary distribution is then a natural method of equilibrium selection 
[10, 11], and solves many problems of the deterministic approach. 

A drawback is that the state space is often very large, making calculation of the 
stationary distribution infeasible. Addressing this, Fudenberg and Imhof [1] study the case 
of a symmetric game played within a single, finite population, and show that, when the 
mutation rate is very small, the evolutionary process simplifies significantly. The intuition 
is straightforward: Starting from a pure (monomorphic) population state, we wait a very 
long time for a new strategy to appear in the population, because the mutation rate is 
small. When it does, it either goes extinct or takes over the population (‘fixes’). Because 
this resolution of the mutant’s fate occurs on a much shorter timescale than the waiting 
time for another mutation to occur, it typically re-establishes a pure state. The process 
therefore approximates a simpler process over just the pure states. This dramatic reduction 
of the state space makes calculation of the stationary distribution computationally simple. 

The transition probabilities of this simpler process depend critically on the various 
mutants’ hxation probabilities—the probability that a given strategy, having arisen in a 
population otherwise pure for a different strategy, subsequently fixes in that population. 
Because the game is symmetric, the payoffs that determine these fixation probabilities 
are frequency dependent—the payoff to a mutant strategy changes as its frequency in 
the population increases. For most evolutionary processes, frequency-dependent fixation 
probabilities either do not exist in closed form, or are intractable when they do [7]. This 
significantly limits the analytical use of Fudenberg and Imhof’s result. 

Here, we employ the basic machinery of Fudenberg and Imhof [1] to derive a result 
similar to theirs for asymmetric games. There are several reasons why such a result 
is desirable. First, many situations in which we might want to study evolutionary or 
learning dynamics are best modelled as asymmetric games—for example, signalling games 
[14-16], games of entry and entry-deterrence [17-19], and games of time consistency and 
commitment [20]. Second, because only strict Nash equilibria of asymmetric games are 
evolutionarily stable [21], the deterministic approach based on evolutionary stability often 
fails. This is especially true for multi-stage asymmetric games, which typically have no 
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strict Nash equilibria (because alternative strategies that induce the same path of play, 
prescribing the same actions on that path but different actions off it, are payoff equivalent). 

In our model, evolution occurs in multiple interacting populations, one for each role in 
the underlying asymmetric game. When the mutation rate is very small, the evolutionary 
process simplifies to one over just the pure states (each population is monomorphic). 
Transition probabilities between pure states in this embedded process again depend on 
the fixation probabilities of single mutants, but these turn out to be much simpler than 
in the symmetric game case of Fudenberg and Imhof. To see this, suppose we start in a 
pure state. A mutant eventually arises in one of the populations, and either goes extinct 
or fixes in that population before another mutant arises in any of the populations. The 
other populations are therefore monomorphic for the duration of the mutant’s extinction 
or fixation. But because the game is asymmetric, payoffs to different strategies in one 
population depend only on the states of the other populations, and so the payoffs that 
determine the fixation probability of the mutant (and therefore the transition probabilities 
in the embedded evolutionary process) are frequency independent. Frequency-independent 
selection is a standard assumption in the population genetics literature [22, 23], and closed- 
form fixation probabilities (exact and approximate) exist for many evolutionary processes 
of interest. Using our result, we can employ these to derive sharper, and sometimes 
even analytical, characterizations of long run evolutionary behaviour in many asymmetric 
games of interest. This allows for powerful evolutionary equilibrium selection in these 
games. 

We illustrate the utility of our result with three examples. First, in a ‘battle of the 
sexes’ game, we show that a closed-form characterization of the stationary distribution is 
possible. Second, we consider a discrete Crawford-Sobel signalling game [15]. We show 
that, when multiple signalling equilibria of differing information content exist for a given 
misalignment of signaller and receiver interests, the most informative is evolutionarily dom¬ 
inant. This gives a more foundational support to Crawford and Sobel’s heuristic argument 
in favour of the most informative signalling equilibria, which they base on Schelling’s [24] 
concept of ‘focal points’. Finally, we apply our methodology to the ‘beer-quiche’ game of 
Cho and Kreps [2], and show that, while it supports the Intuitive Criterion in its selection 
between the two Bayesian Nash equilibria of the game, non-equilibrium states are also 
evolutionarily significant, especially for small population sizes. 

2 Evolution with mutations in multiple finite populations 

Asymmetric games are characterized by the existence of multiple ‘roles’ (‘Player 1’, ‘Player 
2’, etc.). In the evolutionary approach, the simplest way to incorporate multiple roles is 
to model evolution as occurring in multiple interacting populations [3, 5, 12, 25-29].^ 

Suppose that we have an underlying game F with roles f = 1,..., each role associated 

^In Section 5, we discuss how our results relate to the alternative modelling choice of a single population, 
in which each generation, each member draws a role from some distribution. 
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with a finite strategy set Si, and the payoff to a player in role i when play is {si,..., sj) G 
n[=i Si given by 7ri(si, ..., sj) G M. 

We assume the existence of I populations, one for each role, with the size of each 
population i constant through time at W G N. The overall population state at a given 
time is defined as the /-tuple of strategy frequencies in the respective populations at that 
time; p* G 01=1 where A” is the unit simplex in We shall be interested in the 

evolution of this population state over time. 

Evolution proceeds as a stochastic process in discrete time. Each generation, each 
member of each population receives the expected value of interacting, according to E, 
with a group comprising one member from each other population, randomly chosen, and 
with each group equally likely. (The use of expected payoffs, rather than true payoffs 
received from single random interactions, is for the sake of tractability.) If Pj{sj) denotes 
the proportion of members of population j that are playing strategy Sj G Sj at time t, 
then, for example, the expected payoff to a member of population 1 who employs strategy 
s} G Si in period t is 


1^21 \Sl\ 

E7ri(sJ|p*) =ETri{s\\pti) ■■■ ^ pUs^^). ■ .p^iis^/)TTi{s\, , s'}^). 

k2=l kj = l 

Here, p^_i denotes the population states in all populations other than population 1, and 
signifies that the expected payoff to a strategy in population 1 depends only on the strategy 
frequencies in the other populations 2,... ,1, a consequence of the asymmetry of the 
underlying game. 

These expected payoffs in each population i are then translated to non-negative fit¬ 
nesses fi{s^\p-i) according to some positive monotonic transformation (possibly different 
for each population).^ In the case of no mutations, the fitnesses within each population can 
be used to update that population to its next-period state according to an evolutionary or 
imitation dynamic, usually following the general Darwinian, or ‘monotonicity’, principle 
that strategies with high fitness increase in proportion relative to those with low fitness. 

Some notation: let Vi denote the (finite) set of all possible population states for pop¬ 
ulation i, let V = 0^=1 'Pi denote the set of all possible overall population states, and let 
V-i denote the set of all possible population states for populations other than i. The set of 
‘pure’ states for population i, comprises all states in Vi where every member of pop¬ 

ulation i is playing the same strategy (in which case we say that population i is ‘monomor- 
phic’). Abusing notation a little, we label such states by the strategy that all members 
are playing, i.e., pP™® = Si. Finally, the set of overall pure states, pP'^’’® = Vf'^^^, is 

the set of overall population states in which every population is pure. 

The evolutionary process with no mutations in each population i is a stochastic process 

^Since the populations are finite, p* is in fact confined to a finite subset of this space. 

^Popular choices in the evolutionary game theory literature include linear fitness, fi{¥,'Ki) = 1 -f rjiKiVi, 
and exponential fitness, fi{¥,TTi) = expfpiErri); in each case, the parameter rji > 0 mediates the strength of 
selection, i.e., the sensitivity of fitness to changes in expected payoff. 
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{X^{t),t = 0,1,...}, with state space Vi, and transition probabilities Tf{pi,p[\p-i) for 
P = {Pi^P-i) ^ 'P^Pi £ Pi- The transition probabilities depend on the population state p-i 
because this determines fitnesses within population i. 

For each population i, we require two basic assumptions of this no-mutation evolution¬ 
ary process defined by T^{pi,p[\p-i): 

Assumption 1. If in some period a strategy in population i is absent, then it is absent in 
all future periods. Formally, for all {pi,p-i) G "P, p' G Vi, and Si G Si, if Pi{si) = 0 and 
T{pi,p'^\p_i) > 0, then p[{si) = 0. 

Assumption 2. No matter the population state of other populations, any strategy currently 
played in i, unless it is played by all members of i, has positive probability of having in¬ 
creased representation next period. For any {pi,p-i) G V, and for each s* G Si such that 
0 < Piisi) < 1, there exists p' G Vi such that p'(si) > Pi{si) and r°(pi,p'|p_i) > 0. 


Assumptions 1 and 2 are satisfied by many finite-population stochastic processes stud¬ 
ied in evolutionary game theory and population genetics when there are no mutations, 
selection is finitely strong, and fitnesses are positive. These include stochastic models 
of imitation learning [9], the Moran process [30], and the Wright-Fisher process [31, 32]. 
Processes that are excluded include best-response dynamics and fictitious play [9]. 

Loosely, Assumption 1 ensures that the pure states for population i are absorbing. 
In a learning context, it distinguishes imitation learning from other learning processes: 
strategies not employed by anyone in a population cannot be imitated [1, 33, 34]. It is 
also a natural assumption in a biological context; without mutations, the creation of novel 
genes, and therefore novel strategies, is not possible. 

Assumption 2 ensures that non-pure states in population i are transient. This we 
take to be the essence of stochastic dynamics. It is important to note that the ‘positive 
probability’ of Assumption 2 can be very small. It is not restrictive, for example, that 
unsuccessful strategies can spread in a population, since the probability that they do so 
can be appropriately small. 

One context in which assumption 2 might appear, at first glance, to be too strong is 
that of imitation learning in multi-stage games, where some decision nodes are not reached 
given the strategies currently employed in the populations, so that ‘play’ at these nodes 
cannot be directly observed.^ This is a problem for assumption 2 only if a particular 
condition holds, which we consider to constitute a somewhat ‘knife-edge’ case; learning is 
by imitation based only on direct observation of play. 

If imitation can also be based, even if only to a very small degree, on communication 
between agents in a population, then actions at currently-unobserved nodes could be 
discussed and imitated, and so assumption 2 would be valid. This we take to be far more 

^We are grateful to a referee for emphasizing this point, and prompting the present discussion. 
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realistic. It is implicit in Young’s [12] assumption that ‘each time an agent plays he starts 
afresh and must ask around to find out what is going on’, and a similar logic underlies 
many models of social learning (e.g., [35]). Again, we should stress that any amount of 
such communication validates assumption 2; the condition under which it is invalid is 
therefore a knife-edge case.® Communication is especially relevant for situations where 
membership of the populations is not fixed through time, instead being affected by exits 
and entries (as modelled by birth-death processes, for example). In this case, if a decision 
node is currently unreached for a given population, then new entrants in that population 
must nonetheless have strategies that prescribe actions at the unreached nodes; in a pure 
imitation dynamics, they can only get these by ‘asking around’. 

We make the further assumption that the evolutionary processes occur independently 
within each population, in the sense that, although the probability that population i 
transitions from pi to p' between periods t and t + 1 depends on the period-t population 
states of the other populations, the transitions that these other populations make between 
periods t and t -|- 1 do not influence the transition in population i. This is similar to the 
assumption that expected, rather than realized, payoffs are relevant for fitnesses, in the 
sense that it too is an abstraction from the true, random, matching of players in a given 
period. Like the expected payoffs assumption, it is made for tractability. 

Under this assumption, the no-mutation processes aggregate to an overall 

no-mutation Markov process T® over the state space V, where for p = (pi,... ,pi),p' = 

(p'l,.. • ,P'7) G V, T^{p,p') = ULi Tj^{p^,P^\p-^)■ 

We now incorporate mutations into this general evolutionary process. We specify 
for each population i a mutation rate spi > 0, with pj a population-specific parameter 
that governs the between-population relative frequency of mutations, and e an across- 
population parameter governing the overall frequency of mutations. We then alter the 
above no-mutation evolutionary process as follows: From a population state p* in period 
t, a preliminary (pre-mutation) population state p*[|)^ for period t -|- 1 is chosen according 
to the transition probabilities T^, i.e., according to the no-mutation evolutionary process. 

This preliminary population state is then subjected to random mutations of the fol¬ 
lowing form: in each population i, each member has probability epi of discarding her 
strategy and randomly selecting another from the strategy space Si, with each strategy 
(including the one she just discarded) equally likely.® This mutation process is carried 
out independently across the members of a population, and similarly across populations, 
resulting in the final population state for period t -|- 1, p*+^. 

®It might be objected that changing one’s strategy at an unreached decision node would not alter one’s 
payoff, so that imitative strategy changes of this sort would not be expected, but this objection fails to take 
into account the fundamental stochasticity of the process: even detrimental strategy changes are expected 
to occur with some positive probability. 

®We can easily allow for the possibility that not all mutations between strategies within a population 
are equally likely; this case is discussed in Section 5. 
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The evolutionary process with mutations can be summarized by the following scheme: 
p* —)• selection (stochastic) —)■ mutation (stochastic) —)■ . 

Within each population i, this is a stochastic process governed by the transition prob¬ 
abilities r?(pi,p'|p_j). These individual population processes aggregate to an overall 
Markov process over the state space V, defined by the transition probabilities T^{p,p') = 
nil T!{p^ ,p[\p-i) (because the independence of the within-population processes is not 
compromised by the mutations process we have defined). 

Since pi > 0 for each population i, there is positive probability that, from any given 
population state, any state can be reached in one generation (it just requires the appropri¬ 
ate mutations). Consequently, the evolutionary process T^(p,p') with positive mutation 
rates pi is an ergodic Markov chain. It therefore has a unique stationary distribution, 
which it approaches in the long run. 

In principle, this stationary distribution is analytically calculatable, but in reality, 
for many games of interest, the state space (all possible population states) will usually 
be so large that this calculation is infeasible. In general, the size of the state space is 
1^1 = n^ =1 populations, each of size 20 members, and 

each with 4 strategies available to its members, the size of the state space is approximately 
3 X 10®: calculating the stationary distribution thus involves solving a system of about 
3 X 10® linear equations. This problem intensifies as the population sizes increase. 

In the next section, we employ a theorem of Fudenberg and Imhof [1] to show that, 
when the mutation rate is very small for each population (e <C 1), the stationary dis¬ 
tribution of the evolutionary process with mutations approximates an embedded Markov 
process on a much-reduced state space, the set of all pure states 'PP™® (the size of which 
does not increase with increasing population size). Moreover, the asymmetry of the un¬ 
derlying game will render selection frequency-independent in the rare-mutations regime. 
This will make calculation of the transition probabilities of this embedded Markov chain 
much simpler than for symmetric games. 

3 The stationary distribution when mutations are rare 

Assumptions 1 and 2, which concern the within-population no-mutation evolutionary pro¬ 
cesses T®, translate into the following two straightforward propositions, stated without 
proof, concerning the aggregate no-mutation process T®: 

Proposition 1. Under T®, all pure population states p G pP“® are absorbing. 

Proposition 2. Under T®, all population states p G p\pp"^® are transient. 
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Label pure population states by s = (si,..., s/) G -pp^e. here, all members of popula¬ 
tion i play strategy Si G Si. Denote by s/s' the population state where every population 
j ^ i is monomorphic for the strategy sj, and population i is monomorphic for the strat¬ 
egy Si except for one individual, who plays s'.^Si. Let the set of all such states be 


Proposition 3. Fix s G pp^re^ consider the limit lim ^ for states p G 'P\{s}. This 


£—^0 


limit exists for all states p G "P/ls}. However, lim > 0 if, and only if, p G pP'^^®/* 


for some i. Otherwise, lim = 0. 

£- 5>0 


€—^0 


To prove this, note that T^(s, p) is a polynomial in e for all p. For T^{s, s/s'), this 
polynomial has lowest-order term and so lirn ^ Qj^ ^he other 

hand, if the states s and p differ by the strategy played by more than one individual, then 
a one-step transition from the former to the latter requires more than one mutation, so 
T^{s, p) has lowest-order term of order e^, k >2. Thus, for such states p, lim = 0. 

Proposition 3 states that mutations from a pure state to a state where only one indi¬ 
vidual in one of the populations deviates from the pure state are, for small mutation rates, 
at least an order of magnitude more likely than other transitions from the pure state (and, 
owing to the pure states being absorbing under the no-mutation process, mutations are 
the only way to transition out of pure states). 

Now suppose that, from a pure state s, the system transitions to the state s/s'. Since 
interior states in population i are transient, absent further mutations in population i, 
the process will absorb either back into the pure state s (the mutant strategy s' has 
‘gone extinct’) or into the pure state (s', s-i) = (si,..., Sj_i, s', Sj+i, ... ,sj) (the mutant 
strategy s' has ‘fixed’). 

But when the mutation rates are very small, we should expect this extinction or fixation 
of strategy s' to occur before another mutant appears in population i, and indeed before 
a mutant subsequently appears in any other population. This latter fact, that no mutant 
is expected to appear in any of the other populations during the extinction/fixation event 
in population i, is key in determining the probability that fixation of s' will occur in 
population i. Because the underlying payoffs to, and thus fitnesses of, strategies s* (the 
‘incumbent’ strategy) and s' (the ‘mutant’ strategy) depend only on the population states 
in the other populations, and since these are fixed at S-i for the duration of the extinction 
or fixation of s(, the fitness difference between Si and s( is constant for the duration of 
this event. Thus, selection is frequency-independent in this regime, a fact that will make 
the calculation of the various fixation probabilities significantly simpler. 

To formalize this intuition, for states s G "PP™® and s/s' G pp™®/* define /ij(sj,s') := 
lin^ pi[si, s'j\s-i) be the ‘fixation probability’ that, given 

that populations —i remain monomorphic for strategies s_i, a s( mutant who appears in 
population i that is otherwise monomorphic for strategy s* subsequently fixes. Assumption 



2 ensures that this probability is always positive. 

Now let K = I'PP"’’®! = |5i|, and let 1,..., K be some enumeration of the pure 

population states.'^ Construct a K x K transition probability matrix A as follows; 

• If the pure states labelled m and n are s = {si,s-i) and respectively (i.e., 

pure states that differ by only one population’s strategy), then Amn = P^iPiisi, 

• If the pure states m and n differ by more than one population’s strategy, Amn = 0. 

• Having thus defined Amn for all distinct m and n, define Amm = 1 — 

A is the transition probability matrix for a homogeneous Markov chain over the (finite) 
state space Moreover, this Markov chain is irreducible, since any pure state can be 

reached from any other with positive probability in a number of steps equal to the number 
of populations on whose strategies the two pure states differ. 

This establishes the final proposition that we require, that A induces a unique station¬ 
ary distribution on the state space of pure population states [36]: 

Proposition 4. There is a unique vector A = (Ai,..., Ai^-) such that Xj > 0 for all j, 
Ai -j-. .. -f A/c ~ 1? and AA = A. 

We are now in a position to state our main result. Propositions 1-4 ensure that T®, 
T^, and A satisfy Assumptions 6-9 of Fudenberg and Imhof [1]. Employing their Theorem 
2,® we arrive at the following theorem. 

Theorem 1. For each e, denote by A^ the unique stationary distribution of the Markov 
process . If n corresponds, in the enumeration of pure states, to the pure state s, then 

lim A^(s) = Xn- 

That is, the stationary distributions of approach A as mutation rates become small. 


4 The usefulness of the result 

Our result is useful on two fronts. First, it extends to asymmetric games our ability to 
compute the limiting stationary distributions of hnite-population evolutionary or imitation 

particular enumeration that we have found useful: writing Ki = |<S'i|, enumerate the pure 

state , ■ ■ ■, by Ki{rai — 1) -f mi. The population states in the pure state enumerated n can 

then be recovered as follows: mi — 1 = n mod |S/|, and mi — 1 = \ l^\ I “S'* I for each i < I. 

®If there are some m such that Amm < 0 in the above construction of A, one can rescale all mutation 
rates m by an appropriately small factor to render all Amm > 0. Any such rescaling will result in the same 
stationary distribution induced by A. 

®A simple proof of a generalization of Fudenberg and Imhof’s [1] result, holding for more general 
evolutionary processes, has recently been given by McAvoy [37]. 
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processes. We have argued that it is these games, and asymmetric multi-stage games in 
particular, for which stochastic finite-population dynamics are most relevant. 

Second, the fixation probabilities used to calculate A derive from frequency-independent 
selection. Since frequency-independent selection has long been a standard assumption of 
the population genetics literature, we can make use of the many results about fixation prob¬ 
abilities in that literature. This bridge between evolutionary game theory and classical 
population genetics could allow for analytical calculation of the rare-mutations stationary 
distribution, where this would be impossible or infeasible in a single-population symmetric 
game setup [1] (where the fixation probabilities that compose A derive from frequency- 
dependent selection, and therefore typically do not exist in closed form, or are intractable 
when they do [7]^®) 

To illustrate this, we consider three examples: a ‘battle of the sexes’ game, a discrete 
Crawford-Sobel signalling game [15], and the ‘beer-quiche’ game of Cho and Kreps [2]. 

Example 1: Battle of the sexes 

The well-known ‘battle of the sexes’ game involves a man and a woman hoping to coordi¬ 
nate their weekend activities, which are either going to a ballet performance (the woman’s 
preference) or going to a rugby match (the man’s preference). Both the man and the 
woman prefer coordination on either activity to not coordinating. The simple example we 
shall study is summarized by the payoff matrix: 

Woman 
B R 
1,2 0,0 
0,0 2,1 

To cast this into an evolutionary model, assume two separate populations of men and 
women, of size and Ny^ respectively. Each period, each member of each population goes 
either to the ballet or to the rugby, and receives his/her expected payoff from interacting 
with a random member of the other population. (This corresponds to members of each 
group preferring to be at an event attended by many members of the other group, though 
males would prefer this to be at the rugby, and females would prefer it to be at the 
ballet—not too unworldly a scenario!) 

the particular case where the evolutionary process is a birth-death process [7, 36], a closed form 
for fixation probabilities exists—see, e.g., [36, Sec. 4.7] and [7, Eq. 6.13]. In general, it is a complicated 
expression involving the relative fitnesses of the incumbent and mutant strategies at each possible inter¬ 
mediate frequency of the mutant strategy. One case for which this expression simplifies to a tractable 
form is the Moran process, if fitnesses are calculated as exponential functions of game payoffs [38, 39]. For 
some other frequency-dependent evolutionary processes, fixation probabilities can be shown to approach 
tractable representations as the population size becomes very large—see, e.g., [40]. However, as we shall 
discuss in Section 5, the cases to which the ‘rare mutations’ approximation studied here best applies are 
specifically those where population sizes are not too large. 


Man 


B 

R 
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Expected payoffs Ett within both populations translate to fitnesses via the linear trans¬ 
formation feCEn) = 1 -|- r]$ETr, where 9 is either m (‘man’) or w (‘woman’), and rjm and 
r]w are the strengths of selection in the men’s and women’s populations respectively. The 
evolutionary or imitation dynamics within each population are assumed to be a Moran 
process [7, 30], occurring with mutations in the manner set out in Section 2. The per- 
person mutation, or experimentation/error, rates in the men’s and women’s populations 
are efim and respectively. 

In populating the rare-mutations Markov matrix A, we need only consider transitions 
between pure states where either the male population’s strategy is different or the female 
population’s strategy is different, but not both. For example, consider the transition from 
the pure state where the men all go to the rugby and the women all to the ballet, (R, B), 
to the pure state where everyone goes to the rugby, (R, R). (The bold font indicates that 
these are population strategies.) In the former ‘incumbent’ female population, members 
all had fitness 1 -|- 77 ^ 11 ( 0 ) = I. A mutant woman going instead to the rugby has fitness 
1 + hto(l) = 1 + ilw, and thus has (frequency-independent) selective advantage r]^ over the 
ballet-going members of the female population. 

This frequency independence allows us to make use of the well-known formula for 
fixation probability under a Moran process [7]; If a single mutant has selective advantage 
s over the other members of the (size N) population, then its fixation probability is 

^ ^ _ 1 - 1/(1 -I- s) 

l-l/(l + s)^ 

for s 7 ^ 0, and / 3 ( 0 ) = 1/N. The corresponding formula for the case of frequency-dependent 
selection is significantly more complicated [7, 36]. 

The entry of A corresponding to the transition (R, B) —(R, R) is therefore 

A _ " 7 A 1 1/(1 T ^ui) 

A(r,b)^(r,r) - 


For the reverse transition (R, R) —(R, B), mutant women have fitness 1, while 
incumbents have fitness 1 + rjw Mutants are thus at relative selective disadvantage 
= ~'nw /(1 + Pw), and the relevant entry of A is 


^(R,R)-?>(R,B) — pwPw 


-Vw 


I+r]w 


The other entries of A are calculated similarly. 


^wPw 1 (f- T ^Ul) 

2 1 - (1 + ■ 

Enumerating the pure states (B,B), 
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(B,R), (R, B), and (R, R) as 1, 2, 3, and 4 respectively, 


A = 


/ 1 -... 

P'wPw 
PmPmij] m) 


V 0 


PwPw{ij^2^^) 

1 - ... 

0 

Pi-nPmijjp^) 


PmPmii^^) 

0 

1 -... 


0 \ 

PmPmi^P m) I 

PwPwiPw) 

1 -... 


where the ellipses abbreviate that the rows must each sum to one. 

We have a number of free parameters in this model; to wit: the sizes of, selection 
strengths in, and mutation rates in the two populations. As an example, suppose we set 
the sizes of, and selection strengths in, the two populations equal at N and p respectively. 
Making use of the fact that, for the Moran process, p{s)/p{—s/[l + s]) = (1 + we 

calculate the stationary distribution of the Markov chain defined by A: 


A 


(l + 2r?)^-i 




/A, 


where A is a normalization constant. Notice that, in the rare mutations limit, the muta¬ 
tion rates, though possibly different in the two populations, do not affect the long-term 
distribution of states. 

The proportions of time the populations spend both at the rugby and both at the 
ballet are equal, and are higher for larger values of the common selection strength p 
and population size N. The intuition for this effect of p is straightforward: A higher 
p increases the fixation probabilities of positively selected mutants, and decreases the 
fixation probabilities of negatively selected mutants. In this coordination game, the former 
are always mutants leading towards the coordination equilibria (B,B) and (R, R), while 
the latter are always mutants leading away from these coordination equilibria. 

The effect of population size can most easily be seen from the ratio of transition 
probabilities from a non-coordination to a coordination state (positive selection s) and 
vice-versa (negative selection —s/[l -|- s]): p{s)/p{—s/[l + s]) = (1 -|- This ratio 

increases with N, and so, for each path into and out of a coordination equilibrium, a 
higher N increases the transition probability into, relative to the symmetric probability 
out. 


Example 2: Crawford-Sobel signalling 

The next game to which we apply our result is a discrete variant of a signalling game from 
Crawford and Sobel [15, Sec. 4]. Suppose that there are three possible states of the world, 
9 G {0,1,2}, with each equally likely. A signaller observes the state of the world, and 
sends a costless signal s G {a, b, c} to a receiver, who observes only the signal, and not the 
state of the world. Having observed the signal, the receiver makes a decision r. Payoffs to 
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signaller and receiver are as follows: 

T^s{0,r) = -(r - e - jf, 

T^R{0,r) = -{r-ef, 

where 7 > 0 is a parameter that characterizes the signaller and receiver’s misalignment of 
interests (for every 0, the receiver’s optimal decision is 7 lower than the signaller would 
most want it to be). For simplicity, we restrict the receiver’s possible decisions r to the 
set {0, 0.5,1,1.5, 2}, which covers all possible optimal decisions the receiver could make 
given some posterior over the state space, having observed a signal. 

For all 7 > 0, Nash equilibria exist where the signaller sends the same signal no 
matter the state of the world, and the receiver, observing that signal, makes decision 
r = 1. We call these equilibria ‘uninformative’, and label them ‘xxx’, since the same 
signal X G {a,b,c} is sent for each state of the world {0,1,2}. Also, for all 7 , Nash 
equilibria exist where the signaller sends the same signal for states 9 = 0 and 6 = 2, and 
a different signal for state 9 = 1: to all sent signals, the receiver responds with decision 1. 
Since no practical (decision-changing) information is transmitted by the signaller, we also 
call these equilibria, labelled ^xyx\ ‘uninformative’. 

For sufficiently low 7 , there also exist ‘partially informative’ equilibria where, for two 
adjacent states of the world (i.e., { 0 , 1 } or { 1 , 2 }), the signaller sends the same signal, 
but for the other state of the world, sends a different signal. For such ‘xxy’ and ‘xyy’ 
equilibria, these threshold values for 7 are 0.25 and 0.75 respectively. 

Finally, for 7 < 0.5, there exist ‘informative’ equilibria, where the signaller sends a 
different signal for each state (‘xyz’), and the receiver makes a decision equal to the state 
that the signal is sent from. 

A full characterization of the Nash equilibria of this game, including the receiver’s 
responses to unsent signals required to sustain each equilibrium, is included in an appendix. 

Crawford and Sobel [15] argue, somewhat informally, that for a given value of 7 , the 
most reasonable equilibria are the most informative ones possible for that 7 . This, they 
claim, is because these equilibria are Pareto-superior to less informative equilibria, and 
are salient—or ‘focal’ in Schelling’s [24] language—in that they are the most informative 
equilibria (the other salient equilibria are the least informative ones, but these are ruled 
out on the former grounds of being Pareto-inferior to the most informative equilibria). 

The methodology developed in the present paper allows us to test this equilibrium 
selection prediction more formally, in the context of learning by agents. Notice that 
none of the equilibria that are not perfectly informative is strict, so that a deterministic 
infinite-population approach would be of little use here, particularly for higher values of 
the misalignment parameter 7 (for which the informative equilibria do not exist). Instead, 
our finite-population approach is better-suited to this game. 

We assume two populations, one of signallers and one of receivers. The size of each 
population is N. Each signaller is equipped with a response to each possible state of the 
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world, and each receiver with a response to each possible signal. States of the world are 
drawn independently for each individual interaction (i.e., there is no aggregate state of 
the world), and fitnesses are calculated according to expected payoffs. 

Evolution within each population is assumed to be a Wright-Fisher process [31, 32], 
which has been used as a model for both biological evolution [41] as well as imitation 
learning [42, 43]. Expected payoffs translate to fitnesses exponentially, /(Evr) = exp(? 7 E 7 r), 
with selection strength r] and per-person mutation rate Sfi in both populations. 

In constructing A, frequency-independent selection allows us to make use of the well- 
known ‘diffusion approximation’ formula for the fixation probability, under the Wright- 
Fisher process, of a single mutant at selective advantage s in a population of size N [44]: 

1 - exp (s) 

^ 1 - exp i-Ns) 

for s / 0, and / 9 ( 0 ) = 1/A^.^^ Again, the case of frequency-dependent selection is signifi¬ 
cantly more complicated [47-49]. 

We use these fixation probabilities to populate A according to the method set out in 
Section 3, and calculate its stationary distribution. Fig. 1 plots, for the case N = 100 and 
t] = 1, and for various values of the misalignment parameter 7 , the relative frequencies of 
equilibria of different information levels in this stationary distribution.^^ 

It can be seen from Fig. 1 that the results of the learning/evolutionary dynamics in this 
game broadly support Crawford and Sobel’s prediction that the most informative equilibria 
supportable by a given 7 are the most reasonable for that 7 . For low levels of misalignment 
7 < 0.4, the informative equilibria dominate, and information transmission is almost 
always perfect in the long run. For intermediate levels of misalignment (0.4 < 7 < 1), 
partially informative equilibria, especially those of the form xyy, are dominant. For high 
levels of misalignment (7 > 1 ), only uninformative equilibria can be supported, and indeed 
such equilibria dominate the long-run dynamics. 

Note that the equilibria involving signalling of the forms xxy and xyx do not have 
analogs in the equilibria of the game with continuous state, signal, and decision spaces 
[15]; they are artefacts of the discrete structure of the game we have set up. It is reassuring, 
then, that they play little role in the long-run dynamics for all values of 7 . 

^^The diffusion approximation formula cited above is known to be very accurate [45, 46], and we would 
expect our results to alter very little were we to use near-exact numerical approximations of the true fixation 
probabilities. Such numerical estimation of fixation probabilities is computationally very expensive, which 
further highlights the value of our result: when selection is frequency dependent, as in the single-population 
symmetric-game case, fixation probabilities will usually have to be estimated numerically, whereas in our 
asymmetric-game case, where selection is frequency independent, we may make use of well-known exact 
or approximate closed-form fixation probabilities. 

^^The frequencies that we plot for a given equilibrium type are in fact those of all population states 
whose signalling profile is consistent with that equilibrium type: because the populations are large and 
selection is strong, the plotted frequencies correspond closely with those of the equilibria (which would 
take into account receiver behaviour too). 
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Figure 1: Frequencies of the signalling profiles of different levels of information transmis¬ 
sion in the long-run dynamics of the Crawford-Sobel game, plotted for various values of the 
misalignment parameter 7 . Both signaller and receiver populations are of size N = 100; 
fitness is exponential in expected payoffs, with equal selection strength 77 = 1 ; mutation 
rates are equal in the two populations. The results are broadly consistent with Crawford 
and Sobel’s prediction that the most informative equilibria supportable by a given value 
of 7 are the most reasonable for that 7 . 

Example 3: The beer-quiche game 

Our final example is the beer-quiche game of Cho and Kreps [2], employed by them to 
illustrate the equilibrium refinement method they advance, the ‘Intuitive Criterion’. The 
extensive form of the game is given in Fig. 2. Player 1 is either a wimp (type t^) or surly 
(type ts), with probabilities 0.1 and 0.9 respectively. Player 1 knows his type; player 2 
does not. Player 1 either has beer or quiche for breakfast, observed by player 2, who then 
chooses whether to fight player 1 or not. The payoffs are such that player 2 should choose 
to fight player 1 if the posterior probability he holds that player 1 is a wimp is greater 
than 0.5. For any action by player 2, player 1 prefers beer for breakfast if he is surly, but 
quiche if he is a wimp. Regardless of player I’s type, he would prefer to avoid fighting. 

The game has two Bayesian Nash equilibria, both of the ‘pooling’ kind: one in which 
player 1 eats quiche no matter his type, and one in which player 1 drinks beer no mat¬ 
ter his type. In both cases, player 2 chooses not to fight in response to the observed 
behaviour of player 1, but would fight in response to the unobserved behaviour. Both 
pooling equilibria are sustained by player 2 ’s ‘out-of-equilibrium’ belief that, if he were 
to observe player 1 having the opposite breakfast to that consumed in equilibrium, there 
would be a greater-than-half chance that player I’s type was wimp. Cho and Kreps’s Intu- 
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( 2 , 1 ) 


Figure 2: Extensive form setup of the beer-quiche game of Cho and Kreps [2]. 


itive Criterion, however, rules out the always-quiche equilibrium, by the argument that the 
out-of-equilibrium beliefs that player 2 is required to hold do not survive forward-inductive 
reasoning [2] [50, Ch. 11.2]. 

The Intuitive Criterion has been criticized as being, in some cases, too rationality- 
heavy [50]. Our methodology allows us to test whether its prediction in the beer-quiche 
game holds up under a rationality-light learning process, where players need not even 
know the other players’ payoffs. 

We assume two populations, one for each role. Evolution proceeds in each population 
as a Wright-Fisher process with mutations. The population of player Fs, ‘population 
i’, is of size Ni, with selection strength iji, exponential htness fi = exp(r/jE7r), and per- 
individual mutation rate //j. Each member of population 1 has a strategy prescribing his 
breakfast choice (beer or quiche) if he turns out to be wimpish (with probability 0.1) and 
if he turns out to be surly (with probability 0.9). Each member of population 2 has a 
strategy prescribing his response (fight or don’t fight) to seeing a member of population 1 
drink beer for breakfast, and to seeing a member of population 1 eat quiche. Each round, 
each member of each population receives his expected ex-ante (i.e., before types are chosen 
in population 1) payoff from interacting with a random member of the other population. 

We label pure population states by the tuple b(tw),b{ts);r{B),r{Q): respectively, 
breakfast had when wimpish, breakfast had when surly; response to beer-drinking, re¬ 
sponse to quiche-eating. For the former two, B and Q represent ‘beer’ and ‘quiche’, while, 
for the latter two, F and N represent ‘fight’ and ‘no fight’. Again, the bold font is used 
to indicate that these are population strategies. 

The weights of the most popular states in the stationary distribution are displayed in 
Fig. 3, for the parameter settings rji = r ]2 = 0.2, fii = /i 2 , and for various population sizes 
N = Ni = N 2 - For large population sizes {N > 20), the pooling equilibrium predicted 
by the Intuitive Criterion, BB;NF, is the modal state in the stationary distribution. 
For all population sizes, the other pooling equilibrium, ‘all-quiche’, has low weight in the 
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Figure 3: The frequencies of various population states in the long run dynamics of the 
beer-quiche game, plotted for various common population sizes N = Ni = A^ 2 - For 
reference, population state ‘QB;NF’ is that where members of population 1 eat quiche 
(Q) if wimpish and drink beer (B) if surly, while members of population 2 do not fight 
(N) if they see beer-drinking and do fight (F) if they see quiche-eating. The equilibrium 
predicted by the Intuitive Criterion, BB;NF, is modal for large (> 20), but not for low 
(< 20), population sizes. The equilibrium ruled out by the Intuitive Criterion, QQ;FN, 
is infrequent in the long-run dynamics for all population sizes. 


stationary distribution; this supports its rejection by the Intuitive Criterion. 

Apart from the fact that, of the two Bayesian Nash equilibria, the one predicted by 
the Intuitive Criterion is dominant, it is also of interest that non-equilibrium population 
states occur so frequently in the long run. These states are, in order of their weights in the 
stationary distribution, QB;NF, QB;NN, and BB;NN. Indeed, for small population 
sizes (A^ < 20), the state QB;NF has highest weight in the stationary distribution. 

The success of these non-equilibrium states is a result of neutral and nearly-neutral 
drift. Starting from the equilibrium state BB; NF, members of population 2 who instead 
play NN achieve the same expected payoff (0.9) as those playing NF, and so can neutrally 
invade the population. If they fix, the pure population state BB; NN is established. From 
this state, members of population 1 who play QB are slightly favoured over the incumbents 
playing BB (expected payoff 3 versus 2.9), and so can invade and fix, establishing the 
pure state QB;NN. From this state, members of population 2 who play NF are slightly 
favoured (expected payoff 1 versus incumbent expected payoff 0.9). If they invade and 
fix, pure state QB;NF is established. But from this pure state, members of population 
1 who play BB are slightly favoured (expected payoff 2.9 versus incumbent 2.8). If they 
invade and fix, the equilibrium pure state BB; NF is re-established. Notice that, because 
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the reverse directions involve only neutral and slightly disfavourable mutations, they also 
occur with non-negligible probability, and are therefore likely to influence the stationary 
distribution. 

The intuition for the fact that increased population size here results in the system 
spending more time in the Nash equilibrium state is similar to that for the same observation 
in the battle of the sexes. When the population size is small, mutants that are weakly 
selected against still have non-negligible probability of fixing, and so transitions out of 
BB;NF to, say, QB;NF (mutant’s expected payoff only 0.1 less than incumbents’) play 
a role in the long-run dynamics. When the population size is very large, mutants that 
are weakly selected against have very little chance of fixing, and so these paths out of 
equilibrium are shut down, leaving only neutral paths such as BB;NF —?■ BB;NN. 
Increasing selection strength r] would have the same effect. 

5 Discussion 

Our model involves a number of assumptions and simplifications, four major ones of which 
we discuss below: (i) the assumption of multiple populations, (ii) that, in the no-mutations 
process, only pure states are absorbing, (iii) that mutations can, in reality, be sufficiently 
rare for the evolutionary dynamics to behave like the limiting case, and (iv) that mutation 
rates within populations are uniform. Thereafter, we briefly discuss the relevance of the 
approach developed in this paper for mixed-strategy equilibria. 

On (i), an alternative approach would be to model evolution as occurring in a single 
population, wherein each agent has a strategy for every role [51]. Expected payoffs to 
players could then be computed on the basis of random assignment of roles each period. 

In most learning contexts, the multiple-population setup seems more natural: we think 
of roles as being assigned at the outset, with each agent subsequently learning how best 
to play her assigned role. An example is the battle of the sexes game studied in Section 
4, where the gender of each agent is fixed for the duration of his/her learning period. 
The multiple-population setup is also better suited to modelling the genetical evolution of 
multiple interacting, though reproductively distinct, species. In the context of genetical 
evolution within a single species, however, the more natural model is a single population 
in whose genomes strategies for different roles are encoded at different loci. Strategies are 
then collections of alleles, one for each locus, and are inherited intact (ignoring genetic 
recombination). In the course of the propagation of a strategy, which locus is relevant will 
change from generation to generation, as different roles are taken on (carrier is male or 
female, carrier is the incumbent occupant of a territory or the trespasser, etc.). 

When should we expect the evolutionary dynamics under this single-population model 
to resemble those under our multiple-population model (where each locus, or role, is 
treated as a separate ‘population’)? Here, the answer is simpler for deterministic infinite- 
population dynamics. If there is variation within the population for alleles (/strategies) 
at multiple loci (multiple loci exhibit ‘polymorphism’), then the multiple-population ap- 
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proach and the single-population approach yield equivalent dynamics under the determin¬ 
istic replicator dynamics if a simple condition concerning allele frequencies holds [52]. This 
condition, known in the population genetics literature as linkage equilibrium, amounts to 
statistical independence of allele frequencies across loci, and is preserved through time 
under the replicator dynamics [52]. 

In a finite population, polymorphism at multiple loci will be common if the mutation 
rate or the population size are sufficiently large. The stochastic nature of the evolutionary 
process in a finite population ensures that linkage equilibrium will not always hold, and so 
a ‘dynamical equivalence’ result such as that described above is not possible. Nonetheless, 
if mutations at different loci occur independently, then in the regime of rare mutations 
studied in this paper, it will almost always be the case that at most one locus is polymor¬ 
phic in the population. Thus linkage equilibrium will almost always hold, since linkage 
disequilibrium between two loci requires that both loci be polymorphic. In the rare muta¬ 
tions limit, therefore, the dynamics are the same whether we model evolution as occurring 
in multiple populations of loci, or in a single multi-locus population. 

On (ii), we noted that, in multi-stage games, play at unreached decision nodes could 
not be imitated if imitation were a learning process based only on direct observation of 
play. Under this condition, interpreting assumption 2 as applying to imitation processes 
seems unjustihed—mixed population states in which one population is polymorphic for 
an action at an unreached decision node could be maintained in perpetuity. As argued 
in Section 2, this condition represents an unrealistic ‘knife-edge’ case. In reality, we ex¬ 
pect agents to communicate about their strategies. The case of imitation only by direct 
observation is nonetheless a useful benchmark from which to discuss the influence of the 
general factors that validate assumption 2 for imitation learning processes. Despite the 
obvious importance of such a discussion, it has not, to our knowledge, explicitly appeared 
in the literature on stochastic learning in extensive form games. 

On (iii), how rare do mutations have to be for the population dynamics to resemble 
those in our limiting case? A simple heuristic may be derived as follows: Assume all I 
populations to be of size N, with a common individual mutation rate of ju. Consider the 
case where, starting from a monomorphic state, a mutant appears in one of the populations. 
Under most commonly studied population dynamics (e.g., Wright-Fisher, Moran), the time 
that it takes this mutant either to go extinct or fix in its population is of order N or less 
[23, 53]. Say that this time is aN. Then the probability that another mutant appears 
during the extinction/fixation of this mutant is about fiNI x aN] for this probability to 
be below some small threshold n, we require fj, < v/{aIN‘^). If this holds, the dynamics 
should resemble those for the limiting case —)■ 0. 

The bound could probably be loosened for most games, as it can be in the single¬ 
population symmetric game case (Wu et al. [54]): the analogous loosening of the bound 
in Wu et al. [54] is from order 1/iV^ (our heuristic) to order l/(AilnA'). In their case, 
this holds for all games except coexistence games, in which mixed equilibria are stable. 
The reason for this latter fact is that, if selection is very strong in a coexistence game. 
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the population can stabilize around the mixed equilibrium for a very long period of time, 
long enough for another mutation to occur with non-negligible probability. In the case of 
coexistence games, the bound must be tightened to order [54], In asymmetric 

games, an analogous ‘negative feedback’ issue could arise in a situation where two popula¬ 
tions stabilize each other at respective mixed equilibria. A good example is the ‘matching 
pennies’ game, where the row player and column player have the same strategy space 
(heads and tails); if the strategies played match, the row player gets payoff -|-1 and the 
column player gets —1, and if they don’t match, the row player gets —1 and the column 
player -|-1. In our multi-population context, if the column population predominantly plays 
heads, the row population moves to predominantly playing heads, which in turn leads to 
a decrease in the play of heads in the column population, and a subsequent decrease in 
the play of heads in the row population, etc. If selection is strong enough, this situa¬ 
tion could persist for long enough that the chance of another mutation occurring would 
be non-negligible. (Recent derivations [55] of fixation probabilities in a Moran process 
when multiple populations are polymorphic should be useful in analyzing such cases.) In 
this case, a strengthening of the ‘rare mutation’ bound would be required: following the 
analysis of Wu et al. [54], we would expect a bound of order to suffice. 

In any case, it is clear that our rare-mutations result is most relevant either if mutations 
(or experimentation and errors) occur at a low per-period rate, or if the populations under 
study are small, or both. In learning dynamics, interpretation of this ‘rare mutations’ 
condition is difficult, since the rate of mutations is calibrated to the timescale over which 
strategy revisions are made. Thus, a ‘generation’ might in fact constitute a very short 
period of time, and we might expect experimentation or errors to be very infrequent on 
such a timescale. Interpretation of this condition is easier for genetical evolution, where 
the timescale is in generations, and the probabilities of mutations can be reasonably well 
measured. For example, the point mutation rate at a single nucleotide site in humans 
(though known to vary across the genome [56-58], and between the sexes [59]) is of order 
about 10“® per generation [60, 61]. If we set a threshold oi v = 0.05 and a = 1, and 
consider evolution at two independent loci (‘roles’), then the bound fi < v/{aIN‘^) holds 
for populations of up to about 1500 individuals. 

On (iv), it may be objected that, in our model, mutation rates within populations are 
uniform: a mutation from any strategy to any other strategy is equally likely. While this 
assumption may be valid in certain genetic contexts, in a learning context we might expect 
certain errors, or examples of experimentation, to be less likely than others [9]. Also, in 
a genetical context, if we include in our concept of mutation the possibility of structural 
changes (e.g., rearrangements, translocations), or if we are interested in the evolutionary 
dynamics of a certain functional genotype relative to all other genotypes (grouped as one 
class), then asymmetric mutation rates would be natural [62, 63]. 

Our result can be generalized in a straightforward way to incorporate heterogeneity 
in mutation rates within populations. If we denote by £Hi{si,s'^ the probability that a 
member of preliminary period-t population i currently employing strategy Si will mutate 
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to playing s[ in the finalized period-t population, then the evolutionary process with 
mutations is a Markov chain . It is still the case that, for s = {si,..., si) G -ppure^ 
lim = 0 ifp ^ -ppurey-ppure/i gome i. But now, for s/s' G -ppure/* ^ _ 

s[). The transition probability matrix A is then constructed as before. 

If it is always the case that /Uj(si, s^ > 0, then the Markov chain defined by A is 
irreducible, and an analogous form of Theorem 1 goes through as before. If, however, we 
allow there to be some i, Sj, and s' such that ^i{si^s'j) = 0, then the are no longer 

guaranteed to induce irreducible Markov chains. It is then required that have a unique 
stationary distribution for each e > 0, and that there exists a unique stochastic vector A 
such that AA = A, for the analogous Theorem 2 to go through [1, 37]. 

A final point concerns games with mixed-strategy equilibria. In evolutionary game 
theory, two kinds of ‘mixed strategy’ states must be distinguished [64, 65]. The ‘popula¬ 
tion kind’ is where individuals within a population each play pure strategies, but different 
individuals play different strategies. In our setup, when mutations are rare, the system 
spends almost all of the long-run time in pure states (where individuals within each pop¬ 
ulation all play the same strategy); mixed strategies of the ‘population kind’ are therefore 
essentially never observed. The underlying reason is that these polymorphic states are 
transient under the no-mutations process. For a different reason, these ‘population kind’ 
mixed states are also excluded by the evolutionary stability concept of infinite-population 
deterministic dynamics in asymmetric games; the component strategies of an equilibrium 
mixed state must have equal fitness, but then any of them could be involved in a ‘neutral 
invasion’ of the state [66]. 

The second kind of mixed strategy state is the ‘individual kind’, and involves the 
individuals of a population all playing the same mixed strategy. Unlike the ‘population 
kind’, such states can be evolutionarily stable in infinite-population dynamics. They 
do, however, raise a problem for our finite-population approach. Allowing individuals to 
play any mixed strategy requires an infinite strategy space (the unit simplex in for 

population i with pure-strategy space Si), and therefore an infinitely large state space. A 
workaround would be to approximate the infinite strategy space by a discrete lattice. 
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Appendix table 1: Values of 7 for which the signalling equilibria of different information transmission levels exist. Sender strategies are of the 
form (s(0), s(l), s(2)), so that, for example, xyy represents the strategies abh, acc, baa, etc. For each horizontal grey bar, the triplet inside is 
the receiver strategy (r(x), r(y), r(2)) that supports the Nash equilibrium the grey bar represents. The dark grey bar indicates strict Nash 
equilibrium. 
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